Glocal alignment: finding rearrangements during alignment
نویسندگان
چکیده
MOTIVATION To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. The two main classes of pairwise alignments are global alignment, where one string is transformed into the other, and local alignment, where all locations of similarity between the two strings are returned. Global alignments are less prone to demonstrating false homology as each letter of one sequence is constrained to being aligned to only one letter of the other. Local alignments, on the other hand, can cope with rearrangements between non-syntenic, orthologous sequences by identifying similar regions in sequences; this, however, comes at the expense of a higher false positive rate due to the inability of local aligners to take into account overall conservation maps. RESULTS In this paper we introduce the notion of glocal alignment, a combination of global and local methods, where one creates a map that transforms one sequence into the other while allowing for rearrangement events. We present Shuffle-LAGAN, a glocal alignment algorithm that is based on the CHAOS local alignment algorithm and the LAGAN global aligner, and is able to align long genomic sequences. To test Shuffle-LAGAN we split the mouse genome into BAC-sized pieces, and aligned these pieces to the human genome. We demonstrate that Shuffle-LAGAN compares favorably in terms of sensitivity and specificity with standard local and global aligners. From the alignments we conclude that about 9% of human/mouse homology may be attributed to small rearrangements, 63% of which are duplications.
منابع مشابه
Index-based map-to-sequence alignment in large eukaryotic genomes
Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and mapping technologies (e.g. optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kbp–2 Mbp) and thus provide a unique source of information for disamb...
متن کاملOPTIMA: sensitive and accurate whole-genome alignment of error-prone genomic maps by combinatorial indexing and technology-agnostic statistical analysis
BACKGROUND Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus pr...
متن کاملHandling Rearrangements in DNA Sequence Alignment
Sequence alignment is one of the core problems of bioinformatics, with a broad range of applications such as genome assembly, gene identification, and phylogenetic analysis [1]. Alignments between DNA sequences are used to infer evolutionary or functional relationships between genes. Evolution occurs through DNA mutations, which include small-scale edits and larger-scale rearrangement events. T...
متن کاملAn Improved Algorithm for Genome Rearrangements
A remarkable pattern of evolutionary is that many species have closely related gene sequences but differ dramatically in gene order. It raises a new challenge in aligning two genome sequences that we have to consider changes at both the nucleotide level and the locus level such as gene rearrangements, duplication or loss. Finding the series of rearrangements at the same time with changes at nuc...
متن کاملComputational Biology Lecture 18: Genome rearrangements, finding maximal matches
One possibility is to perform a global alignment of the two strings x and y with a special scoring sheme; for instance, +1 for a match, 0 for a mismatch, and 0 for a gap. Then we could identify all the maximal positively scoring chunks of the alignment. The disadvantages of this approach is that it requires O(mn) running time, might not obtain all candidate matches, and obtains matches that are...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 19 Suppl 1 شماره
صفحات -
تاریخ انتشار 2003